{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fcfcce7a1bd48a2b",
   "metadata": {},
   "source": [
    "# 第37章 基于价值的方法"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a2b895f",
   "metadata": {},
   "source": [
    "## 习题37.1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01e4ebdf",
   "metadata": {},
   "source": [
    "&emsp;&emsp;算法37.1的蒙特卡罗预测算法可以估计状态价值函数。还有一个对应的算法用于估计动作价值函数，写出该算法。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a24401c0",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a26f336d",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3abf0c86",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b5ac88a",
   "metadata": {},
   "source": [
    "## 习题37.2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1603e34e",
   "metadata": {},
   "source": [
    "&emsp;&emsp;例37.1的问题中，假设折扣因子$\\gamma = 0.9$。用蒙特卡罗预测估计策略$\\pi_1$在每一个状态的价值，这时分别使用第一次访问估计和每一次访问估计。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e06b6cda",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "112725a8",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d0ab2f3",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bb4b8f9",
   "metadata": {},
   "source": [
    "## 习题37.3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3067f3a1",
   "metadata": {},
   "source": [
    "&emsp;&emsp;例37.1的问题中，假设策略$\\pi_2$是在每一个格点都向左移动。用蒙特卡罗预测估计策略$\\pi_2$在每一个状态的价值。比较策略$\\pi_1$和策略$\\pi_2$的价值。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ad83031",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "411bdfdf",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a84efde",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc9c4bd5",
   "metadata": {},
   "source": [
    "## 习题37.4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "efe411cb",
   "metadata": {},
   "source": [
    "&emsp;&emsp;算法37.2的TD(0)算法可以估计状态价值函数。还有一个对应的算法用于估计动作价值函数，写出该算法。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d66d71e",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc28a743",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9024abaa",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27ba6917",
   "metadata": {},
   "source": [
    "## 习题37.5"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eff422f6",
   "metadata": {},
   "source": [
    "&emsp;&emsp;在以下表中标出答案为“是”的部分。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49e4c1cc",
   "metadata": {},
   "source": [
    "|  | 动态规划 | 蒙特卡罗预测 | 时序差分预测 |\n",
    "| :---: | :---: | :---: | :---: |\n",
    "| 可用于模型无关的情况 |  |  |   |\n",
    "| 可用于无限期MDP |  |  |   |\n",
    "| 不依赖于马尔可夫假设 |  |  |   |\n",
    "| 在极限收敛于真实值 |  |  |   |\n",
    "| 可得到价值的无偏估计 |  |  |   |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29cad240",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c74544d0",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20503f16",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd280fcb",
   "metadata": {},
   "source": [
    "## 习题37.6"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8e03ea3",
   "metadata": {},
   "source": [
    "&emsp;&emsp;比较蒙特卡罗预测算法、蒙特卡罗控制算法、TD(0)算法、SARSA算法、Q学习的学习收敛条件。注意这些条件都是充分条件而不是必要条件。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc224740",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52875025",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d40edba2",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0a84e4abb4434",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
